Apache Lucene: From Text Indexing to Artificial Intelligence

Lucian Precup • Location: Theater 5 • Back to Haystack 2024

Apache Lucene celebrated its twenty-second anniversary last September, a journey that continues to profoundly impact the world of Search and Data technologies. Lucene is the engine behind giants such as Elasticsearch, OpenSearch, Apache Solr, and the recent Atlas Search from MongoDB. Its integration into numerous other Open Source projects, such as Apache Nutch - the pioneering web crawler and precursor to Hadoop, and Apache Cassandra - the most scalable NoSQL database, attests to its widespread influence. Used in thousands of enterprise projects, including by leaders like LinkedIn and Twitter, Lucene enjoys a solid and diverse user base. The conference will dive into Lucene’s evolution, from its essential inverted indexing for text processing to recent innovations that reflect continuous technological advancement. To conclude, we will discuss Lucene’s latest features: vector indexing and vector search, which create a powerful synergy with generative artificial intelligence, opening new horizons for the future of search.

Lucian Precup

Adelean

Lucian Precup is the CTO of [all.site](https://all.site/) - the collaborative search engine developed at [Station F](http://stationf.co) in Paris. With his colleagues at [Adelean](http://adelean.com), Lucian develops solutions for indexing, searching and analyzing data. Lucian regularly shares his knowledge in specialized conferences and organizes the [Search, Data & AI Meetup](https://www.meetup.com/fr-FR/search-and-data/).